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Unchecked breast cell growth is one of the leading causes of death in women 
globally and is the cause of breast cancer. The only method to avoid breast 
cancer-related deaths is through early detection and treatment. The proper 
classification of malignancies is one of the most significant challenges in the 
medical industry. Due to their high precision and accuracy, machine learning 
techniques are extensively employed for identifying and classifying various 
forms of cancer. The authors of this review studied numerous data mining 
algorithms and implemented them such that clinicians might use them to 
accurately detect cancer cells early on. This article introduces several 
techniques, including support vector machine (SVM), K star (K*) classifier, 
additive regression (AR), back propagation (BP) neural network, and 
Bagging. These algorithms are trained using a set of data that contains tumor 
parameters from breast cancer patients. Comparing the results, the authors 
found that SVM and Bagging had the highest precision and accuracy, 


respectively. Also assess the number of studies that provide machine learning 
techniques for breast cancer detection. 
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1. INTRODUCTION 

The data mining industry utilizes computational, statistical, and optimization techniques to "read" data 
from historical instances and find difficult-to-data models from vast, noisy, or complex datasets. These 
properties, which depend on complicated proteomic and genomic measurements, are ideally suited for 
medicinal applications [1], [2]. Cancer is diagnosed and identified using data mining techniques such as the 
support vector network, the Bayesian confidence network, and the artificial neural network (ANN) [3], [4]. 
Fluorodeoxyglucose (FDG) positron emission tomography (PET) [5] enable clinicians to analyze the role of 
the tumor in the human body by recovering radiation symbol tracers. In recent years, a variety of machine 
learning [6]-[9], bio-inspired computation techniques [10]-[16], and deep learning [17] have been applied to 
make medical predictions. Recently, computer training has expanded to include cancer detection and prognosis. 

The survey revealed that some techniques are optimal for testing the usefulness of datasets [18]. 
Various essential data extraction technologies are being improved and deployed in a variety of real-world 


Journal homepage: http://ijeecs.iaescore.com 


7718 i) ISSN: 2502-4752 


applications (e.g., healthcare, bioscience, and industry) to extract useful data bits from individual data to aid in 
decision-making [19]. In machine learning environments, a reasonably large amount of data consisting of 
actual medical cases of men diagnosed with prostate cancer who receive medical attention is used for the 
systematic comparison of procedure. Methods of machine learning (ML) are software programs that predict 
anything (behavior, the form of a disease, and the picture of stock price volatility) based on the conditions that 
led to prior events [20], [21]. Breast cancer is the most prevalent malignancy in women worldwide. Breast 
cancer is caused by the early growth of specific breast cells. Several methods have been developed for detecting 
breast cancer. Breast imaging is a type of mammography [22]. Breast imaging is sometimes known as 
mammography. Is a method for diagnosing breast cancer. X-rays are used to assess the condition of female 
nipples. It is nearly impossible to detect breast cancer in the observable external cancer cell in its first stages. 
Mammography can detect cancer at an early stage and just takes a few minutes. 

Dynamic magnetic resonance imaging (MRI) [23] has established the detection strategy for breast 
deformation. The method forecasts the growth rate of tumor angiogenesis. Magnetic reasoning imaging led to 
a reduction in contrast metastases in breast cancer patients. Ultrasound is a well-known tool for detecting 
symptoms within a sound wave traveling through the body [24]. A transducer generating sound is put on the 
skin, and the sound waves capture the tissue reflections [25]. Electrography is a freshly developed technology 
based on images [26]. This method is utilized when breast cancer tissues are larger than the neighboring normal 
parenchyma. A sample compression color map distinguishes between benign and malignant types. Although 
numerous approaches have been validated, none of them can produce an accurate and consistent outcome. In 
mammography, the doctors must interpret a large amount of picture data, which reduces accuracy. 

It takes time, and in some of the worst cases, incorrectly diagnoses the disease. This article compares 
multiple machine learning algorithms for disease diagnosis using data. To accurately detect the condition, six 
supervised machine-learning approaches were employed. The breast cancer dataset was classified using the 
decision tree, Bayes, and neural net techniques [27]. The experiment concludes that the neural net classifies 
breast tumors with greater sensitivity and accuracy than the decision tree and Naive Bayes methods. A machine 
has been created in [28] to aid in the differentiation between malignant and benign cancers. The backward 
elimination (BE) method was combined with the random forest tree to determine functionality. The dataset 
was gathered from the predictive network in Wisconsin. The accuracy of this hybridization approach is roughly 
95%, and the number of variables has been lowered from 33 to 17 to 18 It is utilized to evaluate three related 
algorithms, namely support vector machine (SVM), ANN, and decision tree (DT) [29]. This investigation 
utilizes a database collected from the Iranian Center (ICBC). Algorithm SVM produced up to 95% accuracy 
using a total of 8 predictor variables. Rana [30] developed the SVM collection with a decision tree (C5.0) 
model for breast cancer detection. The dataset was produced by integrating 32 parts of the prognostic dataset 
from Wisconsin. Rank-based role selection was utilized to achieve the variable reduction. The performance of 
the radial base function for five characteristics is 92.59%. Several statistical models of contemporary machine 
learning technologies used to detect cancer progression were addressed [30]. 

The author has reviewed numerous ML-related literature in this study. Each category and 
classification of the papers varies with respect to the dataset and its characteristics. The precision of 
mammographic data is as high as 83%, while the precision of other datasets is as high as 71%. Frequently, 
these papers contain up to 14 variables of mammograms, with the precision of mammographic data as high as 
83%. Comparative experiments in [31], [32] featured numerous machine learning algorithms for breast cancer 
estimate and diagnosis, including SVM, logistic regression, Naive Bayes, and the k-nearest neighbors (KNN). 
95.6% and 68% of the breast cancer recurrence and non-recurrence data from the Wisconsin prognostic breast 
cancer data repository were utilized for the analysis, respectively. Several classification algorithms, such as 
director tree (AD), decision tree (j48) algorithm, and best first tree (B+ tree), were executed. The dataset was 
obtained from the diagnostic center of Swami Vivekananda in Chennai. It contains 220 medical data and is 
used to evaluate nine characteristics. The outcome indicates that 99% of four algorithms are j48 [33]. 

A breast cancer solution was developed that differentiates between various forms of breast cancer. 
The method focuses on the diagnosis and estimation of breast cancer in Wisconsin, as well as the identification 
of multiple types of breast cancer [34]. Utilizing and analyzing two distinct migratory topologies in Iceland 
allows for a more precise and time-efficient training technique. A prognosis of illness status was presented 
utilizing a hybrid method for anticipating improvements and their repercussions, which are crucial to deadly 
infections [35]. Their strategy to alerting the public about the severity of diseases consists of two primary 
components: i) treatment and extraction of informational choices; and ii) decision based on the tree-support 
hybrid model for predictions DT-SVM. To construct accurate predictive models for breast cancer utilizing data 
mining approaches, they studied Wisconsin machine learning datasets from unique client identifier (UCD. 
Table 1 shows then list of publications comparing parameters, accuracy, and various algorithms for breast 
cancer diagnosis 
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Table 1. Presents a list of publications comparing parameters, accuracy, and various algorithms for breast 
cancer diagnosis 


Method Accuracy Objective Features 
Neural networks 96.14% Implementation of breast cancer prediction classification techniques 9 
[25] 
Random forest tree 99% Random forest classification along with breast cancer detection and 17 
prognostic function collection [26] 
Neural Network 95% Three machine learning models for predicting breast cancer recurrence 8 
SVM and DT [27] 
SVM 92.59% Breast cancer identification using decision tree and support vector 5 
ensemble with reduced function subset [28] 
various predictive model 83% Applications for machine learning for cancer prognosis [29] 14 
(SVM), Logistic Regression, 95.6% ML models for identifying breast cancer and forecasting recurrence [30] noes 
Naive Bayes and (KNN) 
j48, Best First Tree (B+ tree), and 99% Performance analyzes of breast cancer decision tree algorithms [31] 4 
AD tree 
ISLAND APPROACH TO 99.97% A neural network approach to characterization and breast cancer 9 
NEUROL NETWORK diagnosis [32] 
DIFFERENTIAL 
Support Vector Machine and 95% Big-data extraction: DT-SVM hybrid platform for breast cancer 9 
Decision tree prediction [33] 
Fuzzy inference system 93% Breast cancer risk identification Mamdani fuzzy inference method [34] 4 
Support Vector Machine 97% A significance vector machine learning technique is used to classify 4 
cancer [35] 
Naive Bayes 92% Bayes Weighted classification: a breast cancer detection predictive plan 9 
[36] 
CNN 82.73% In mammogram imaging applications, convolutional neural networks 
are used to detect breast cancer [37] 
SVM and ANN 97.14% An examination of the effect of artificial neural networks and vector- subset 
supporting devices on the diagnosis of breast cancer [38] 
backpropagation Artificial Neural 91.7% Textural feature-based mammogram classification using ANN [39] 
Network 
c-SVM and v-SVM 97.68% A vector-based ensemble algorithm supports the detection of breast 
cancer [40] 
RepTree 81.3% prediction Particle swarm 
Naive Bayes 80% Selection for breast cancer recurrence 
k-NNs 715% Optimization [41] 
Normalized Multilayer Perceptron 99% Created a model employing a normalized multilayer perceptron neural 14 
Neural Network network. The findings obtained are excellent [42] 
CNN 98% For breast cancer classification, a new deep Feed forward NN model 4 


with four AFs has been proposed: Swish, hidden layer 1; LeakyReLU, 
hidden layer 2; ReLU, hidden layer 3; and naturally Sigmoidal final 
feature layer [43] 


SVM 90% four distinct classification models, including SVM, KNN, Naive Bayes, 
KNN 87% and DT, with characteristics picked at varying threshold levels. The 
Naive Bayes 85% suggested approaches were applied to separate gene expression datasets 
DT 87% for performance evaluation and validation [44] 


Three classification technologies in the Weka program are equivalent, with the DT-SVM being more 
predictive than naive classification and sequential minimal optimization. Fuzzy logic was utilized in [36] to 
detect the existence of breast cancer. This study collects information from the UCI learning repositories. The 
objective is to detect breast cancer by lowering the causes of the disease and reducing the time required for 
diagnosis. The localiser type directional Aid (LDA) procedure was utilized to choose the feature, while the 
Fuzzy Mamdani method was employed to teach it. A fuzzy deduction is a technique for inference. Fuzzy logic 
was utilized to evaluate the outcomes. 93% of the findings were made available to the public. Tan et al. [37] 
developed an effective machine learning strategy for cancer classification that could improve cancer 
classification accuracy. The project consists of two phases. Utilizing the Ariance analysis (ANOVA) scoring 
scheme to select the essential genes is the initial stage. The classification task in the second step requires the 
application of a suitable classifier. Two of the most effective machine learning classifiers were relevance vector 
machine (RVM) learning and fuzzy support vector machine (FSVM) learning. To compute the testing values, 
three origin points, Lymphoma, Leukemia, and Small round blue cell tumors (SRBCT) data sets, have been 
defined. The Naive Bayes classifier explored the performance criterion of the machine learning method 
employing a new weighted approach to breast cancer classification in [38]. Weighted ideas are implemented 
to expand and improve the performance of standard Naive Bayes models. Breast cancer dataset-based domain- 
awareness weight assignment utilizing the machine learning library at UCI. The experiments show that an 
approach to heavy naive berries is preferable to the naive method. A convolutional neural network (CNN)- 
based approach was developed in [39] to aid in the rapid diagnosis of problems by physicians. Using enhanced 
mammography pictures, the classifier constructed a model to detect the tumor in cancer patients. The proposed 
approach features a high degree of accuracy and a quick diagnostic time. SVM and ANN are suggested as 
templates for mixed feature series in [40]. 
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The classification tasks were accomplished utilizing a distinct combination of feature subsets by 
determining the optimal parameters and dividing the results. The SVM displayed superior classification of data 
with a precision of 97,1388%, compared to the ANN's accuracy of 96,7096%. Sakri et al. [41] a suggestion to 
identify breast cancer using law's texture energy measurement (ITEM) instrument. The backpropagation 
artificial neural network (BPANN) classifies malignant, normal, and natural tissue sections using an approach. 
90,9% of the typical irregular grouping is responsive to the proposed technique, with 94.4% accuracy. The 
accuracy for benign versus malignant classification is 91.7% and 66.6%, respectively. SVM weighted area, 
unless the recipient's operational curve includes a variation of the breast cancer learning ensemble, the breast 
cancer learning ensemble is not included area under curve (AUC) [42]. Six C-SVM and v-SVM kernel 
functions can be added to the standard model package. It was demonstrated that the proposed model 
considerably improves breast cancer diagnosis. The accuracy of the model was 97.68. They published a report 
regarding the relationship between classifications without using a feature selection scheme. 70%, 76.33%, and 
66.33% of the time, respectively, are generated by the Naive Bayes, RepTree, and K-NNs. They process their 
results using the Weka platform. With the deployment of particle swarm optimization (PSO), the four most 
advantageous characteristics for this categorization function were determined. With PSO, the precision rates 
for Naive Bayes, RepTree, and K-NN were 81.3%, 80%, and 75.3%, respectively. The researchers constructed 
a model utilising a normalized multilayer perceptron neural network to accurately identify breast cancer [43], [44]. 

The obtained results are exceptional (accuracy is 99.27%). In comparison to other studies employing 
ANN, this result is extremely encouraging. As a control test, breast cancer Wisconsin (Original) was utilized. 
To build models for identifying the two types of breast cancer, researchers employed four distinct classification 
models, including SVM, KNN, Naive Bayes, and DT, with attributes chosen at varying threshold levels [45]. 
The suggested methodologies were applied to distinct gene expression datasets in order to evaluate and validate 
their performance. The SVM algorithm effectively classified breast cancer into triple negative and non-triple 
negative subtypes with less misclassification errors than the other three algorithms evaluated. 


2. METHOD 
2.1. Preprocess of data mining 

ML approaches can separate the learning process into two groups: unsupervised and supervised. 
Diverse data instances are employed and tagged to achieve optimal performance for training the system for 
unsupervised instruction. However, there are no predetermined knowledge sets available in education, making 
the goal impossible to attain. The results are not anticipated. Classification is one of the most prevalent forms 
of regulated schooling. It utilizes previous data to establish a benchmark for future forecasts. Utilizing historical 
data. In the realm of medicine, clinics and hospitals keep huge databases containing patient histories and 
symptom diagnosis. Using this knowledge, researchers then develop categorization models based on historical 
events. Thus, medical inference with the aid of computers has gotten simpler, given the sheer volume of 
medical data available today [46]. 


2.2. Neural network 

An ANNs reduced sight is a knowledge-based computational algorithm. The meaning and function of 
ANN are identical to those of the human brain. The observation of raw data reveals correlations and broad 
patterns [47]. There are relationships between the network's weight and nodes. There are four hidden layers in 
the neural network: center, input, and output. Each of these layers is connected to the neural network using a 
weight connector [48]. This research presents a neural network that employs a multilayered vision and a back 
propagation. In the back spreading network, there are three distinct layers (input, hidden, and output) in which 
a signal traverses one path such that it does not return to its source after conveying the neuronal output from 
the input neuron. Figure 1 illustrates the neural network. 
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Figure 1. Back propagation neural network [48] 
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2.3. K-star classification algorithm 

K-star, often known as K%*, is an instance-based classifier. Using a correlation function, this approach 
attempts to determine if the instance is connected to any of the training datasets. This method differs from other 
instance-based learners since it uses an entropy-based function. This function classifies the circumstance by 
allocating it to a predefined and classed data set model. The crucial aspect of this hypothesis is that similar 
circumstances impart similar categories [49]. 


2.4. Additive regression 

A meta-classifier that enhances the visual appeal of a regression-based base classifier. Every repeat 
offers a pattern to the residuals generated by the classifier during the previous iteration. Attaching the 
predictions of any reliable classifier yields the forecast. Overcoming the reduction (learning rate) parameter 
facilitates limit overfitting and generates a smoothing effect, but improves learning [50]. 


2.5. Bagging 

Bagging is a method for improving the performance of classification algorithms in machine learning. 
This technique was described by Leo Breiman, and its name was derived from the term "bootstrap aggregation" 
[51]. On the basis of a foundational set of example data D, a classification algorithm generates a classifier H: 
D-1,1 for categorization within a couple of possible categories. The bagging technique generates a sequence 
of classifiers Hm, m=1,..., M based on the training set's qualities. [52] These classifiers are combined to create 
a composite classifier. 


2.6. Support vector machine 

The support vector machine method separates the data set into hyperplane margin-appropriate groups. 
This method is often used in the field of medicine to diagnose the ailment. Given that a data set may contain 
many hyper lines, the SVM algorithm attempts to produce a maximal difference between various groups by 
maximizing the limit [53]. Observe the darkness of this dataset's groups. On a single line, the Wisconsin breast 
cancer dataset cannot be separated into perfect groups. Figure 2 depicts the shadows. 

Using transformation to address this problem and add a Z-axis dimension. When a dataset is shown 
on the Z-axis, the stark distinction between groups is now readily obvious. The procedure is carried out using 
kernels. Polynomial and exponential kernels measure a separation line with a higher dimension. Figure 3 
illustrates the significance of kernels in deciding the acquisition of dark data. 
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Figure 2. Wisconsin dataset class distribution [52] Figure 3. Using the kernel for classification [54] 


3. RESULTS AND DISCUSSION 

The training data collection used in this work was taken from the breast cancer data of Wisconsin. 
This data set is available in the ML repository at UCI [54]. Its data collection is multivariate and has more than 
569 examples. Based on more than 10 features, a digital image's cell nuclei are classified as malignant or benign 
[55]. The 10 qualities are (Area, Compactness: (p*p/a-1), where p is the perimeter and area, concave points: 
the number of concave contour sections, and a coastline approximation is a type of fractal. Concavity is the 
endpoint of a concave contour component. This paper includes the Anaconda software as a tool for teaching 
machines. Anaconda is a Python-based open-source program that was first released under the new Berkeley 
source distribution (BSD) License in 2012. It contains several machine learning algorithms and approaches, 
including the algorithms covered in this article. Python and programming are free and open-source languages 
for scientific computation [56]. This program also offers data science, large-scale data management, and 
predictive analytics. After comparing the outcomes, the parameters and results of five distinct data mining 
models are displayed. The authors discovered that SVM and K* star had the highest accuracy at 98.6%, 
followed by Bagging at 97.3%, BP at 97%, and additive regression at 93.6%. In addition, we examine the 
number of studies that have developed machine learning algorithms for breast cancer detection. Identification 
rate is often referred to as accuracy. The definition of the accuracy measure is the number of instances properly 
identified divided by the total number of instances in the data collection. 
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The precision for various sets can be altered and is highly dependent on the classification threshold 
[57]. The precision may be determined by (1). 


TP+TN 
TP+FP+TN+FN 


Accuracy = (1) 

TP is the true positive, whereas TN is the true negative. The complete shape of P is also positive, signifying 
cancer cells, whereas N denotes negatives and benign non-cancerous cells. Precision is frequently equated with 
assurance. Precision is determined by the rates of true positive cases and true positive instances. Precision 
demonstrates the classifier's ability to deal with good events, but has minimal bearing on unfavorable scenarios. 
Precision and recall are proportional to one another [58]. This parameter can be determined by (2). 


Precision= 


P 
TP+FN (2) 
The recall is depicted as false-negative and overly hopeful examples. This metric is utilized in the 
medical industry since it provides information on the correct classification of the number of malignant and 
benign cases. The model will locate all cases in the dataset that are pertinent. With the equation, one can 
calculate the recall. Table 2 shows the comparison and contrasts the recall and accuracy of the Wisconsin 
dataset for five machine learning techniques. 


TP 
TP+FN 


Recall = (3) 


Table 2. The comparison and contrasts the recall and accuracy of the Wisconsin dataset for five machine 


learning techniques 
Comparison of Wisconsin breast cancer classification algorithms 


Method Accuracy Precision Recall 

Support Vector machine 98.6% 97.5% 96.9% 
K star 98.6% 97.3% 97% 

Bagging 97.3% 97.2% 96.8% 

Neural network 97% 97.2% 96.9% 

Additive regression 93.5% 96.2% 93.2% 


4. CONCLUSION 

Cancer of the breast is the most common form of the disease in the world. A woman chosen at random 
has a 13% probability of having the condition diagnosed. A significant number of lives can also be saved with 
the early identification of breast cancer. SVM, K*, BP, Bagging, and additive regression are the five methods 
of machine learning that are discussed in this article for predicting breast cancer. We looked at five different 
data mining methodologies and rated them based on their recall, precision, and accuracy. Comparisons of the 
efficacy of the various algorithms have also been made, using the Wisconsin dataset as the basis. In their 
research, the authors found that the SVM and K* have the highest degree of precision. 
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